Can deep effectiveness metrics be evaluated using shallow judgment pools?

X Lu; A Moffat; J Shane Culpepper

Conference Proceedings

Can deep effectiveness metrics be evaluated using shallow judgment pools?

X Lu, A Moffat, J Shane Culpepper

Association for Computing Machinery (ACM) | Published : 2017

DOI: 10.1145/3077136.3080793

Abstract

© 2017 Copyright held by the owner/author(s). Increasing test collection sizes and limited judgment budgets create measurement challenges for IR batch evaluations, challenges that are greater when using deep effectiveness metrics than when using shallow metrics, because of the increased likelihood that unjudged documents will be encountered. Here we study the problem of metric score adjustment, with the goal of accurately estimating system performance when using deep metrics and limited judgment sets, assuming that dynamic score adjustment is required per topic due to the variability in the number of relevant documents. We seek to induce system orderings that are as close as is possible to t..

View full abstract